A Generic Sentence Trimmer with CRFs

نویسنده

  • Tadashi Nomoto
چکیده

The paper presents a novel sentence trimmer in Japanese, which combines a non-statistical yet generic tree generation model and Conditional Random Fields (CRFs), to address improving the grammaticality of compression while retaining its relevance. Experiments found that the present approach outperforms in grammaticality and in relevance a dependency-centric approach (Oguro et al., 2000; Morooka et al., 2004; Yamagata et al., 2006; Fukutomi et al., 2007)− the only line of work in prior literature (on Japanese compression) we are aware of that allows replication and permits a direct comparison.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Citation Handling: Processing Citation Texts in Scientific Documents

Title of thesis: CITATION HANDLING: PROCESSING CITATION TEXTS IN SCIENTIFIC DOCUMENTS Michael Alan Whidby Master of Science, 2012 Thesis directed by: Professor Bonnie Dorr Dr. David Zajic Department of Computer Science Citation sentences (sentences that cite other papers) play a key role in the summarization of scientific articles. However, a citation-based summarization system that depends on ...

متن کامل

A Sentence-Trimming Approach to Multi-Document Summarization

We implemented an initial application of a sentence-trimming approach (Trimmer) to the problem of multi-document summarization in the MSE2005 and DUC2005 tasks. Sentence trimming was incorporated into a feature-based summarization system, called MultiDocument Trimmer (MDT), by using sentence trimming as both a preprocessing stage and a feature for sentence ranking. We demonstrate that we were a...

متن کامل

Sentence Compression as a Component of a Multi-Document Summarization System

We applied a single-document sentencetrimming approach (Trimmer) to the problem of multi-document summarization. Trimmer was designed with the intention of compressing a lead sentence into a space consisting of tens of characters. In our Multi-Document Trimmer (MDT), we use Trimmer to generate multiple trimmed candidates for each sentence. Sentence selection is used to determine which trimmed c...

متن کامل

Adding Redundant Features for CRFs-based Sentence Sentiment Classification

In this paper, we present a novel method based on CRFs in response to the two special characteristics of “contextual dependency” and “label redundancy” in sentence sentiment classification. We try to capture the contextual constraints on sentence sentiment using CRFs. Through introducing redundant labels into the original sentimental label set and organizing all labels into a hierarchy, our met...

متن کامل

Sentence boundary detection using sequential dependency analysis combined with CRF-based chunking

In spoken language, sentence boundaries are much less explicit than in written language. Since conventional natural language processing (NLP) techniques are generally designed assuming the sentence boundaries are already given, it is crucial to detect the boundaries accurately for applying such NLP techniques to spoken language. Classification frameworks, such as Support Vector Machines (SVMs) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008